Hyper-systolic algorithms for N-body computations and parallel level-3 BLAS libraries

نویسنده

  • Thomas Lippert
چکیده

Hyper-systolic algorithms repesent a new class of parallel computing structures. Because of their regular communication and compute patterns they are well suited for implementation on most parallel architectures, in particular, high performance SIMD machines can beneet considerably. After a short explanation of the concept of hyper-systolic algorithms, their application to N-body computations and distributed matrix multiplication is discussed. Results from real implementations are presented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Technical Paper Accepted for Publication in Siam Review Software Libraries for Linear Algebra Computations on High Performance Computers 1 Software Libraries for Linear Algebra Computations on High Performance Computers

This paper discusses the design of linear algebra libraries for high performance computers. Particular emphasis is placed on the development of scalable algorithms for MIMD distributed memory concurrent computers. A brief description of the EISPACK, LINPACK, and LAPACK libraries is given, followed by an outline of ScaLAPACK, which is a distributed memory version of LAPACK currently under develo...

متن کامل

Software Libraries for Linear Algebra Computations on High Performance Computers 1 Software Libraries for Linear Algebra Computations on High Performance Computers

This paper discusses the design of linear algebra libraries for high performance computers. Particular emphasis is placed on the development of scalable algorithms for MIMD distributed memory concurrent computers. A brief description of the EISPACK, LINPACK, and LAPACK libraries is given, followed by an outline of ScaLAPACK, which is a distributed memory version of LAPACK currently under develo...

متن کامل

Hyper - Systolic Implementation of BLAS - 3 Routines on the APE 100 / Quadrics

Basic Linear Algebra Subroutines (BLAS-3) 1] are building blocks to solve a lot of numerical problems (Cholesky factorization, Gram-Schmidt ortonormalization, LU decomposition,...). Their eecient implementation on a given parallel machine is a key issue for the maximal exploitation of the system's computational power. In this work we refer to a massively parallel processing SIMD machine (the AP...

متن کامل

Efficiency of Reproducible Level 1 BLAS

Numerical reproducibility failures appear in massively parallel floating-point computations. One way to guarantee the numerical reproducibility is to extend the IEEE-754 correct rounding to larger computing sequences, as for instance for the BLAS libraries. Is the overcost for numerical reproducibility acceptable in practice? We present solutions and experiments for the level 1 BLAS and we conc...

متن کامل

Implementing Blas Level 3 on the Cap–ii

The Basic Linear Algebra Subprogram (BLAS) library is widely used in many supercomputing applications, and is used to implement more extensive linear algebra subroutine libraries, such as LINPACK and LAPACK. The use of BLAS aids in the clarity, portability and maintenance of mathematical software. BLAS level 1 routines involve vector-vector operations, level 2 routines involve matrix-vector ope...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Parallel Computing

دوره 25  شماره 

صفحات  -

تاریخ انتشار 1999